Importing necessary libraries and data

Data Overview

Observation

  1. Ticker Symbol, Security, GICS Sector, GICS Sub Industry are Object Variables. Rest are numerical variables.
  2. Current Price, Price Change, Volatility, Earnings per share, Estimated Share Outstanding, P/E Ratio, P/B Ratio are float Variables.
  3. All the Object Variables should be converted to Categorical Vairables.

The Dataset has 340 rows and 15 columns.

To check for the duplicates

Observations

There are no duplicate rows.

Observation

  1. Ticker Symbol, Security, GICS Sector, GICS Sub Industry are Object Variables. Rest are numerical variables.
  2. Current Price, Price Change, Volatility, Earnings per share, Estimated Share Outstanding, P/E Ratio, P/B Ratio are float Variables.
  3. All the Object Variables should be converted to Categorical Vairables.

Observation.

  1. The mean value of Current Price, Volatility, ROE, Cash Ratio, Estimated Shares Outstanding, P/E Ratio, P/B Ratio is greater than 50% value. They are positively skewed.
  2. Current Price vary between 4.5 and ~(1235)
  3. Price Change ranges between -4.7e+01 and 5.505e+01.
  4. Negative P/B ratio indicates company is loosing money.

Observation

  1. Security is unique for each company. There is no repeatation.
  2. GICS Sector has 11 unique value. Industrials is the top one repeated 53 times.
  3. GICS Sub Insdustry has 104 unique values. Oil & Gas Exploration and Production is the top one repeated 16 times.
  4. Ticker Symbol has 340 unique value

Missing Value

Observations

There are no missing values in the data provided.

Observation

  1. Security is unique for each company. There is no repeatation.
  2. GICS Sector has 11 unique value. Industrials is the top one repeated 53 times followed by Financials.
  3. GICS Sub Insdustry has 104 unique values. Oil & Gas Exploration and Production is the top one repeated 16 times followed by REITs.
  4. Ticker Symbol is unique for each company

Check for unique values in the column

Convert the object variables to Categorical Variables.

All the object variables are converted to categorical variables.

Observation.

  1. The mean value of Current Price, Volatility, ROE, Cash Ratio, Estimated Shares Outstanding, P/E Ratio, P/B Ratio is greater than 50% value. They are positively skewed.
  2. Current Price vary between 4.5 and ~(1235)
  3. Price Change ranges between -4.7e+01 and 5.505e+01.
  4. Negative P/B ratio indicates company is loosing money.

There are no missing values.

There are no null Values in the given dataset.

Exploratory Data Analysis (EDA)

Questions:

  1. What does the distribution of stock prices look like?
  2. The stocks of which economic sector have seen the maximum price increase on average?
  3. How are the different variables correlated with each other?
  4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
  5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Univariate Analysis

Observation

  1. Current Price, Volatility, ROE, Cash Ratio, Estimated Shares Outings and P/E Ratio are positively skewed. There are some outliers.
  2. We see a bell shape curve for Price change, Net Cash Flow, Net Income, Earnings Per Share and P/B Ratio having outliers at both the end.
  3. In most of the cases, current Price is less than 100 but, there are some that go till 1300.
  4. Price change ranges between -45 and ~60.
  5. Volatility ranges between 0.5 and 1.0.
  6. Mostly ROE value is less than 50.
  7. In most of the cases, cash ratio is less than 100 but, it can go upto 900.
  8. Though, Net Cash flow ranges between -1.5 and 2.3, Mostly, it ranges between -0.25 and 0.25.
  9. Most of the cases, Earnings per share is less than 5.
  10. P/E Ratio is always greater than 0. Mostly it is less than 25. Highest being ~525.
  11. In some cases, P/B ratio is less than 0. A low P/B ratio could also mean the company is earning a very poor.

Observation

  1. 90% of the stocks,current price is less than 120.
  2. Price change varies between -40 and 40.
  3. 95% of the stocks, the Volatility is less than 2.5
  4. 98% of the stocks, ROE value is less than 200.
  5. 95% of the stocks have cash ratio less than 200.

Observation

  1. 15.6% of the stocks come under Industrials sector followed by Financials.
  2. Only 1.5% of the stocks come under Telecomunications Services.

Bivariate Analysis

Checking correlations

Observation

1. The highest correlation is between Estimated Shares Outstanding and Net Income(0.59) followed by Earnings Per Share and Net Income

2. There is negative correlation between

a. Volatility and Current Price, Price Change, Net Cash Flow, Net Income, Earnings Per Share and Estimated Shares Outstanding. b. Current Price and ROE, Net Cash Flow. c. Price Change and ROE. d. ROE has negative correlation with all except Volatility and P/E Ratio. e. Net Cash FLow is negatively correlated with current price, Volatility and Estimated shares outstanding. f. Earnings per share and Estimated shares outstanding, P/E Ratio. g. P/E Ratio is negatively correlated with Price Change, Net Income, Earnings Per Change and Estimated shares outstanding. showing, if one value increases the other decreases.

Observation

A. The highest correlation is between Estimated Shares Outstanding and Net Income(0.59) followed by Earnings Per Share and Net Income
B. There is negative correlation between
  1. Volatility and Current Price, Price Change, Net Cash Flow, Net Income, Earnings Per Share and Estimated Shares Outstanding.
  2. Current Price and ROE, Net Cash Flow.
  3. Price Change and ROE.
  4. ROE has negative correlation with all except Volatility and P/E Ratio.
  5. Net Cash FLow is negatively correlated with current price, Volatility and Estimated shares outstanding.
  6. Earnings per share and Estimated shares outstanding, P/E Ratio.
  7. P/E Ratio is negatively correlated with Price Change, Net Income, Earnings Per Change and Estimated shares outstanding. showing, if one value increases the other decreases.
C. All are normally distributed except P/E Ratio which is bimodal

GICS Sector Vs Current Price

Observation

  1. The Current Price for Health Care stocks is slightly more than the Consumer Discretion.
  2. Telecommunications stocks have least current price followed by Energy.

GICS Sector Vs Price Change

Observation

  1. The Price Change for Energy is negative which is the least followed by Utilities.
  2. Highest Price change is for health care stocks followed by Customer Staples.

GICS Sector Vs Volatility

Observations

  1. Volatility is highest for Energy stocks followed by Materials and Consumer descretion.
  2. Volatility is least among the Consumer Staples.

GICS Sector Vs ROE

Observation

  1. ROE is least among the Utilities and Real Estate stocks.
  2. ROE is highest among Energy stocks followed by sonsumer staples.

GICS Sector Vs Cash Ratio

Observation

  1. Cash Ratio is highest among Infomation Technology stocks and least among Utilities.

GICS Sector Vs Net Cash Flow

Observation

  1. Net Cash Flow is negative among the Tele Communications, Energy anf Industrials.
  2. It is negligible among the Real Estate Stocks.

GICS Sector Vs Net Income

Observation

  1. Net Income is negative for Energy Stocks.
  2. Net Income is least (positive) among the Materials and Real Estate Stocks.

GICS Sector Vs Earnings Per Share

Observation

  1. Earnings Per Share is negative for Energy.

GICS Sector Vs Estimated Shares Outstanding

Observation.

  1. Estimated Shares Outstanding is highest for Tele Communication followed by Consumer Staples.
  2. It is least among Materials.

GICS Sector Vs P/E Ratio

Observation

  1. P/E Ratio is highest for Energy and least for Tele communication Services.

GICS Sector Vs P/B Ratio

Observation

  1. P/B Ratio is positive and highest for Information Technology stocks followed by Energy.
  2. Rest of the categories are negative.

QUESTIONS

1. What does the distribution of stock prices look like?

Observation

  1. The Current Price is positively skewed.
  2. There are some outliers with highest values 1250.
  3. Most of the stocks have Current Price less than 80.
  4. Most of the stocks have Current Price value between 20 and 60.

2. The stocks of which economic sector have seen the maximum price increase on average?

Observation

  1. The Stocks of Health care Sector has maximum price increase on average.
  2. The second highest is Consumer Staples.
  3. The price change is negative for Energy sector.

3. How are the different variables correlated with each other?

Observation

A. The highest correlation is between Estimated Shares Outstanding and Net Income(0.59) followed by Earnings Per Share and Net Income
B. There is negative correlation between
  1. Volatility and Current Price, Price Change, Net Cash Flow, Net Income, Earnings Per Share and Estimated Shares Outstanding.
  2. Current Price and ROE, Net Cash Flow.
  3. Price Change and ROE.
  4. ROE has negative correlation with all except Volatility and P/E Ratio.
  5. Net Cash FLow is negatively correlated with current price, Volatility and Estimated shares outstanding.
  6. Earnings per share and Estimated shares outstanding, P/E Ratio.
  7. P/E Ratio is negatively correlated with Price Change, Net Income, Earnings Per Change and Estimated shares outstanding. showing, if one value increases the other decreases.

4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?

Observation

High Cash Ratio is preferred, as it indicates that a company can easily pay its debt.

  1. Cash Ratio is highest among Information Technology Sector followed by TeleCommunication Services and then Healthcare.
  2. Cash Ratio is least among Utilities and Industrials.

5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Observation

  1. P/E Ratio is high among the Energy Sector. Companies under Energy sector are willing to invest more.
  2. P/E Ratio is least among TeleCommunication Services sector followed by Financials.

Data Preprocessing

Missing Value Treatment

Check for missing value

Observation

There are no missing values. Hence, there is no need for missing value treatment.

Duplicate Rows Check

Observations

There are no duplicate rows

Feature Engineering

Outlier Treatment

The Outliers are treated.

The Data is scaled and is now ready for clustering.

EDA

Univariate Analysis

Observation

After outlier treatment and scaling,

  1. Current Price is normally distributed. Highest current price has come down to 175. Current Price of most of the stocjs is between 25 and 75.
  2. Price changes now ranges between -18 and 27. Most of the stocks, price range ranges between 0 to 10.
  3. Highest Volatility is between 1.00 and 1.25.
  4. Most of the stocks habe ROE less than 20
  5. Most of the stocks have Cash Ratio below 50.
  6. Most of the stocks have Net Cash Flow = 1
  7. Net Income is less than 1 in most of the cases.
  8. Earnings per share is between 1 and 3 for most of the stocks.
  9. Estimated Shared Outstanding is less than 0.3 in most of the cases.
  10. P/B ratio is between -15 and 15. Negative P/B ratio means the stocks are undervalued.

Observation

  1. 90% of the stocks,current price is less than 120.
  2. Price change varies between -17 and 27.
  3. 90% of the stocks, the Volatility is less than 2.1
  4. 85% of the stocks, ROE value is less than 35.
  5. 90% of the stocks have cash ratio less than 130.

Observation

  1. 15.6% of the stocks come under Industrials sector followed by Financials.
  2. Only 1.5% of the stocks come under Telecomunications Services.

Bivariatte Analysis

Observation

A. The highest correlation is between Earnings Per Share and Net Income(0.60) followed by Estimated Shares Outstanding and Net Income(0.54).
B. There is negative correlation between

Volatility and Current Price, Price Change, Net Cash Flow, Net Income, Earnings Per Share and Estimated Shares Outstanding. Current Price and ROE, Net Cash Flow. Price Change and ROE. ROE has negative correlation with all except Volatility and P/E Ratio. Net Cash FLow is negatively correlated with current price, Volatility and Estimated shares outstanding. Earnings per share and Estimated shares outstanding, P/E Ratio. P/E Ratio is negatively correlated with Price Change, Net Income, Earnings Per Change and Estimated shares outstanding. showing, if one value increases the other decreases.

Observation

  1. All the distributions changed to bimodal

QUESTIONS

1. What does the distribution of stock prices look like?

Observation

  1. The Current Price is positively skewed.
  2. There are no outliers with highest values 175.
  3. Most of the stocks have Current Price less than 70.
  4. Most of the stocks have Current Price value 50.

2. The stocks of which economic sector have seen the maximum price increase on average?

Observation

  1. The Stocks of Health care Sector has maximum price increase on average.
  2. The second highest is Consumer Staples.
  3. The price change is negative for Energy sector.

3. How are the different variables correlated with each other?

Observation

A. The highest correlation is between Earnings Per Share and Net Income(0.60) followed by Estimated Shares Outstanding and Net Income(0.54).
B. There is negative correlation between

Volatility and Current Price, Price Change, Net Cash Flow, Net Income, Earnings Per Share and Estimated Shares Outstanding. Current Price and ROE, Net Cash Flow. Price Change and ROE. ROE has negative correlation with all except Volatility and P/E Ratio. Net Cash FLow is negatively correlated with current price, Volatility and Estimated shares outstanding. Earnings per share and Estimated shares outstanding, P/E Ratio. P/E Ratio is negatively correlated with Price Change, Net Income, Earnings Per Change and Estimated shares outstanding. showing, if one value increases the other decreases.

4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?

Observation

  1. High Cash Ratio is preferred, as it indicates that a company can easily pay its debt.
  2. Cash Ratio is highest among Information Technology Sector followed by TeleCommunication Services and then Healthcare.
  3. Cash Ratio is least among Utilities and Industrials.

5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Observation

  1. P/E Ratio is high among the Energy Sector. Companies under Energy sector are willing to invest more.
  2. P/E Ratio is least among TeleCommunication Services sector followed by Financials.

K-means Clustering

Appropriate value of K seems to be 3 or 4.

checking the silhouette scores now

The higher the Silhouette score, the better the cluster. Silhouette score is highest for n_clusters = 3.

Observation

  1. After considering average silhouette score and comparing with the plots, we observe that
  2. n_clusters = 6, 7 , 8 are pretty decent as the distribution is nearly similar. Though there is one cluster that has more distribution. They also have negative silhouette scores.
  3. n_clusters = 6 have 5 negative silhouette scores while the others have 6 negative silhouette scores.
  4. Hence, we will consider n_clusters = 6, as silhouette score is high enough and there is knick at 6 in the elbow curve.

Cluster Profiling

Observation

  1. Cluster 0 has high Price change, Cash Ratio and P/B Ratio.
  2. Cluster 1 has highest number of segments.
  3. Cluster 2 has Higher currest Price, ROE, Earnings Per Share
  4. Cluster 3 has high Net Cash Flow, Net Income, Estimated Shares Outstanding.
  5. Cluster 4 has high Volatility and P/E Ratio

Insights

Cluster 0

  1. Cash Ratio is high
  2. P/B ratio is high. P/E Ratio, Net Cash Flow, Volatility are decent scores.
  3. Current Price and Price change are equal.
  4. ROE, Net Income, Earnings Per Share , Estimated Shares Outstanding are negative scores.

Cluster 1

  1. All the attributes are negative.

Cluster 2

  1. Current Price and Earnings Per Share are High.
  2. Price Change is low, ROE is decent score.
  3. Net Income is Very low.
  4. Volatility, Cash Ratio, Estimated Shares Outstanding and P/E Ratio are negative.

Cluster 3

  1. Current Price, Volatility and P/E Ratio, P/B Ratio are negative.
  2. Net Cash Flow, Net Income, Estimated Shares Outstanding are high.
  3. Price Change, ROE and Earnings Per Share are decent scores.
  4. Cash Ratio is low.

Cluster 4

  1. Volatility is very high, P/E Ratio is high.
  2. ROE and P/B ratio are decent.
  3. Estimated Shares Outstanding is low.
  4. Price Change(Very low), Current Price(very low), Cash Ratio, Net Cash Flow, Net Income(Very low), Earnings Per Share are negative(very low).

Cluster 5

  1. Net Income and Estimated Shares Outstanding are high
  2. Price Change and Earnings Per share are decent.
  3. ROE, Cash Ratio and P/B ratio are low.
  4. Current Price, Estimated Share Outstanding Volatility are negative scores(low), Net Cash flow is very low(negative).

Hierarchical Clustering

Obervation

  1. Cophenetic correlation for Euclidean distance and average linkage is 0.7325610568988988. This value is the highest value which will be considered.

Let us explore different linkage methods with Euclidean distance alone.

Let's see the dendrograms for the different linkage methods.

Observations

  1. Cophenetic correlation for Euclidean distance and average linkage is 0.73. This value is the highest value which will be considered.
  2. Hence we will consider Average linkage.
  3. Number of clusters = 6 appears to be appropriate

Let's create 6 clusters

Cluster Profiling

Observation

  1. Cluster 1 has 290 companies, Cluster 2 has 15 companies, cluster 0 has 31 countries, cluster 4 has 1 companies, cluster 3 has 1 company and Cluster 5 has 2.
  2. There is much variability here, Lets see how Ward linkage looks as it has more distinct clusters from dendogram.

Lets try ward linkage.

Cluster Profiling

Observation

  1. Cluster 0 has 197 companies.
  2. Cluster 4 has 25 companies.
  3. Cluster 3 has 23 Companies.
  4. Cluster 1 has 52 companies.
  5. Cluster 2 has 43 companies.

Observation

  1. Cluster 0 has high number of segments.
  2. Cluster 1 has Price change, Cash Ratio, Net Cash Flow and P/B ratio.
  3. Cluster 2 has Net Income, Estimated Shares Outstanding.
  4. Cluster 3 has high Volatility and P/E Ratio.
  5. Cluster 4 has high current Price, ROE and Earnings per share.

Let's compare Cluster vs GICS Sector

Insights

Cluster 0

  1. This has all negative values.

Cluster 1

  1. ROE, Earnings Per Share and Net Flow negative
  2. Current Price very low.
  3. Price Change, Volatility Estimated Shares Outstanding, P/E Ratio high,
  4. Cash Ratio is very high
  5. P/B Ratio is high

Cluster 2.

  1. Current Price, Volatility, P/B Ratio and P/E Ratio negative.
  2. Price change, Cash Ratio and Net Cash Flow low.
  3. Net Income and Estimated Shares Outstanding Very high

Cluster 3

  1. Current Price, Price Change, Net Income, Earnings Per Share, Cash Ratio and Net Cash Flow are negative.
  2. Volatility is very high
  3. ROE is very high
  4. Estimated Shares Outstanding and P/B Ratio is very low

Cluster 4

  1. Volatility, Cash Ratio, Net Cash Flow, Estimated Shares Outstanding and P/E Ratio is negative
  2. Current Price and Earnings Per Share are very high.
  3. ROE, P/B Ratio and Net Income are decent
  4. Price change is low

K-means vs Hierarchical Clustering

Hierarchical clustering took a lot of time to create the dendograms.

Actionable Insights and Recommendations

We have found companies with some very similar metrics as follows: